Sanskrit signs and Pān. inian scripts

نویسنده

  • Gérard Huet
چکیده

We discuss ways of understanding the Pān. inian grammatical tradition of Sanskrit in computationally tractable ways. We propose to dissociate the formal expression of the locutor’s communicative intention (expressed as a composition of sign combinators called a script), from its justification (using Pān. inian rules and meta-rules). Computation consists then in evaluating a Pān. inian script to its final sign, delivering both the correct enunciation, and its meaning expressed as a non-ambiguous paraphrase. 1 Computational linguistics and the As. t .ādhyāyı̄ It is now recognized as an undisputed fact that Pān. ini was a genius linguist 25 centuries before linguistics was established as a scientific discipline in Europe by de Saussure, and that his As. t .ādhyāyı̄ is a very complete and precise grammar of Sanskrit. This scholarly consensus must be distinguished from opinions stated in various social media, claiming that Pān. ini’s As. t .ādhyāyı̄ is a faultless computer program, and that Sanskrit is the perfect programming language of the future. Usually such hyperbolic assertions (atiśayokti) are not backed up by any argumentative justification. It has also been claimed that Pān. ini invented the Backus-Naur form of context-free grammars. This originates from a 1967 note in a computer journal by Peter Ingerman (Ingerman, 1967) without any precise evidence. Such uninformed anachronistic judgements are misleading, and just add confusion to the debate around the actual contribution of Pān. ini to formal computation and information theory besides linguistic modeling. Actually, even if it is far-fetched to recognize a context-free grammar description in Pān. ini’s grammar, it is a fact that many formal description mechanisms are explicit in the As. t .ādhyāyı̄. For instance, external sandhi operations are defined by sūtras of a standardized form which may be unambiguously decoded as algebraic rewrite rules of the form : [x]u|v → w, with x, u, v, w ∈ Σ∗, where Σ denotes the set of phonemes (varn. a) of Sanskrit. The encoding uses Sanskrit morphology (vibhakti) to discriminate the fields of a record encoding the 4-tuple of strings x, u, v and w that are the parameters to the rewrite rule (Cardona, 1974; Bhate and Kak, 1993). The rule may be read as a computation procedure to rewrite a juxtaposition of u and v in the input string as string w in a left context x. That is, XxuvY may be rewritten as XxwY for any strings X and Y . If we further specify that rewriting is done uniformly in a left-to-right fashion, we get indeed a vikāra algorithm (vidhikalpa) that applies (external) sandhi to strings of phonemes in order to transform a list of isolated words (padapāt .ha) into a continuous enunciation (sam. hitāpāt .ha). It is easy to relate such rules to contemporary morpho-phonetic rules in computational linguistics, building on the theory of regular relations in formal language theory (Kaplan and Kay, 1994; Koskenniemi, 1984). Indeed, such Pān. inian rules may be directly fed into the finite state toolkits implementing this paradigm (Huet, 2005; Hyman, 2009). This sort of mechanism may be applied as well to vowel grade shift (gun. a, vr.ddhi), vowel harmony, etc. The situation is more complex for generative morphology, where word construction from morphemes and affixes uses retroflexion, which needs for its specification a non-regular operation, where the left context must be inspected on an unbounded, although generally small, suffix. Indeed, many Pān. inian rules are of a more complex nature, involving context-free and even context-sensitive formulations. Furthermore, the “flow of control” of Pān. inian rules, including rules of a meta-linguistic nature, is a complex affair, and it is not possible to regard As. t .ādhyāyı̄ directly as a computer program whose instructions would be the sūtras. Actually, part of the problem is the conciseness (lāghava) of its description, a very important concern since the grammar had to be exactly memorized by the traditional students. We may rather think of As. t .ādhyāyı̄ as a high-level program compiled into a low-level machine code, where techniques of compaction such as sharing have been applied to obtain a low memory imprint, at the expense of control complexity. Indeed, the advent of printing allowed equivalent reformulations of the grammar in more hierarchical ways, and presumably of easier use to the student, but at the expense of duplication of rules (Dı̄ks.ita et al., 1905). It remains that Pān. ini is the ultimate authority, and that the perfection of its description induced a prescriptive nature of the grammar, seen as the gold standard of Sanskrit, following Patañjali magisterial commentary (Joshi and Roodbergen, 1990; Filliozat, 1975). This explains the stability of the language, since it could evolve only through the constraints of the grammar. Thus further commentaries were reduced to settle matters of details, and to elucidate the flow of control of the grammar usage (Sharma et al., 2008; Joshi and Roodbergen, 2004; Sharma, 1987). Thus Pān. ini’s As. t .ādhyāyı̄ is often (justly) referred as a generative grammar for Sanskrit. Actually, when challenged, a competent (śis. t .a) Sanskrit locutor should be able to exhibit the sequence of Pān. inian sūtras (prakriyā) validating his linguistic productions. Indeed, such systematic sequences have been worked out for the various examples discussed in traditional grammars (Grimal et al., 2006). Thus it would seem that it could be possible in principle to write a simulator of Pān. inian derivations which would take sūtras as instructions and derive Sanskrit strings guaranteed by construction to be correct Sanskrit. 2 Using the As. t .ādhyāyı̄ in generation There have been indeed attempts to write a simulator as a computer program that would progressively elaborate a target Sanskrit utterance as a sequence of operations on a string of phonemes – certain ending up as phonetic material, others being meta-linguistic markers (anubandha) which are progressively eliminated when the operation they trigger is effected. See for instance the work of Anand Mishra (Mishra, 2009; Mishra, 2010), of Peter Scharf (Scharf, 2009), and of Pawan Goyal et al. (Goyal et al., 2009). The first remark to be made is that the As. t .ādhyāyı̄ is not self-sufficient. It must be used together with specialized lexicons, one giving roots with derivational markers (dhātupāt .ha), another one giving lists of words sharing morphological characteristics (gan. apāt .ha), still other ones listing attested genders of substantives (liṅgānuśāsana) (Cardona, 1976). Access to these resources is triggered by root or stem selection. One practical problem is to decide which version of these resources to use, since the lexical lists are open-ended and have been amended or reorganised since Pān. ini’s time. Another difficulty is that checks must be effected that a rule application is indeed permitted at the time of its invocation. This induces the maintenance of complex data structures storing the derivation history, the verification of context conditions implicitly carried over from one sūtra to the next (anuvr.tti), but also the analysis of complex priority rules between sūtras (siddha, asiddhavat) which are not always consensual among experts. Also, certain sūtras are subject to semantic conditions (rule X is valid for root R “in the sense of ...”) which are not directly amenable to computation. Aspects of this control problem, and their relation with computational devices, have been discussed in (Goyal et al., 2009). Finally, many rules specifying optional operations are non-deterministic in nature (with a long history of discussions on the optionally/preferably interpretations (Kiparsky, 1980)). These difficulties lead one to believe that As. t .ādhyāyı̄ can be used to generate an enunciation S only if, not only S is known in advance, but its intended meaning is known too. And there might still be choices in the application of rules which must be made explicit if one wants to obtain a deterministic simulation. The rules discuss both forms and meanings. However the grammar cannot be construed to generate meaning from correct enunciations (think of śles.a ambiguity), nor correct enunciations from meaning (since there are many ways to say the same thing, specially in a language with flexible word order). Rules have conditions both on the surface realisation (phonemic strings) of the considered enunciation and on its intended meaning. Any attempt to explain generativity in unidirectional way runs into circularities (itaretarāśrayados.a). As Peter Scharf puts it: “The rules do not actually generate the speech forms in certain meanings; they instruct one that it is correct to use certain speech forms in certain meanings” (Scharf, 2009). The solution to these difficulties is to make explicit oracle decisions fixing all these choices1, and to consider that the derivation process operates not just on surface material (strings of phonemes and markers) but on signs in the sense of de Saussure, that is pairs of enunciations and of their meanings. This will be possible if we identify precisely the semantic combinators implicit in the derivational process. The derivational process ought to derive not just the target final enunciation, but also a formal expression representing its sense, or some disjunction of possible senses, when some ambiguity remains.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pān. Inian Linguistics

1 Pān. ini’s grammar Pān. ini’s grammar (ca. 500 B.C.) seeks to provide a complete, maximally concise, and theoretically consistent analysis of Sanskrit grammatical structure. It is the foundation of all traditional and modern analyses of Sanskrit, as well as having great historical and theoretical interest in its own right. Western grammatical theory has been influenced by it at every stage of...

متن کامل

a-headers from the As.t.ādhyāyı̄ in Sanskrit literature from the perspective of corpus linguistics

The paper presents strategies for evaluating the influence of Pān. ini’s As.t.ādhyāyı̄ on the vocabulary of Sanskrit. Using a corpus linguistic approach, it examines how the Pān. inian sample words are distributed over post-Pān. inian Sanskrit, and if we can determine any lexicographic influence of the As.t.ādhyāyı̄ on later Sanskrit. The primary focus of the paper lies on data exploration, becau...

متن کامل

Tagging Classical Sanskrit Compounds

The paper sets out a prima facie case for the claim that the classification of Sanskrit compounds in Pān. inian tradition can be retrieved from a very slight augmentation of the usual enriched context free rules.

متن کامل

Modelling the Grammatical Circle of the Paninian System of Sanskrit Grammar

In the present article we briefly sketch an extended version of our previously developed model for computer representation of the Pān. inian system of Sanskrit grammar. We attempt to implement an antecedent analytical phase using heuristical methods and improve the subsequent phase of reconstitution using the rules of As.t .ādhyāȳı by incorporating strategies for automatic application of gramma...

متن کامل

Analyzing English Phrases from Pāṇinian Perspective

This paper explores Pān. inian Grammar (PG) as an information processing device in terms of ‘how’, ‘how much’ and ‘where’ languages encode information. PG is based on a morphologically rich language, Sanskrit. We apply PG on English and see how the Pān. inian perspective would deal with it from the information theoretical point of view and its effectiveness in machine translation. We analyze En...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015